Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check stride on preallocated output for matmul (fixes #15286) #15288

Merged
merged 1 commit into from
Mar 1, 2016

Conversation

timholy
Copy link
Member

@timholy timholy commented Feb 29, 2016

@timholy
Copy link
Member Author

timholy commented Feb 29, 2016

Easy-click link: #15286.

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

LGTM. Andreas?

(aside: BLIS https://github.com/flame/blis would allow arbitrary strides here)

andreasnoack added a commit that referenced this pull request Mar 1, 2016
Check stride on preallocated output for matmul (fixes #15286)
@andreasnoack andreasnoack merged commit c3b372a into master Mar 1, 2016
@andreasnoack andreasnoack deleted the teh/matmul_subarray branch March 1, 2016 15:07
@andreasnoack
Copy link
Member

@tkelman We allow arbitrary strides so I'm wondering how much speedup BLIS can get on matrices with special strides.

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

Yeah, it depends on how well optimized the non unit stride case is in BLIS relative to the julia generic gemm. Probably not commonly benchmarked but worth trying.

@timholy
Copy link
Member Author

timholy commented Mar 1, 2016

When strides get so big that there's only 1 element per cache line, I suspect the best performance might be achieved by copying (which essentially compacts the data).

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

I believe BLIS does copy for the non unit stride case in order to still use optimized simd operations, but only one panel at a time rather than the entire array. Should reread their papers and code though. On typical dgemm their haswell kernels are quite competitive with openblas and mkl.

@timholy
Copy link
Member Author

timholy commented Mar 1, 2016

Interesting. I'd be surprised if we couldn't someday match them in pure julia with @threads and vectorization, but we aren't there yet.

@tkelman
Copy link
Contributor

tkelman commented Mar 1, 2016

Yeah, the basic code generation patterns they do would all translate naturally into Julia style code generation (nicer, actually, since they're leaning heavily on the c preprocessor), and we could use their kernels as inline llvm or asm. Some day.

tkelman pushed a commit that referenced this pull request Mar 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants